Online Expansion of Largescale Data Warehouses

نویسندگان

  • Jeffrey Cohen
  • John Eshleman
  • Brian Hagenbuch
  • Joy Kent
  • Christopher Pedrotti
  • Gavin Sherry
  • Florian Waas
چکیده

Modern data warehouses store exceedingly large amounts of data, generally considered the crown jewels of an enterprise. The amount of data maintained in such data warehouses increases significantly over time—often at a continuous pace, e.g., by gathering additional data or retaining data for longer periods to derive additional business value, but occasionally also precipitously, e.g., when consolidating disparate data warehouses and Data Marts into a single database. Having to expand a data warehouse with 100’s of TB of data by a substantial portion, e.g., 100% or more is a complex and disruptive maintenance operation as it typically involves some sort of dumping and reloading of data which requires substantial downtime. In this paper we describe the methodology and mechanisms we developed in Greenplum Database to expand largescale data warehouses in an online fashion, i.e., without noticeable downtime. At the core of our approach is a set of robust and transactionally consistent primitives that enable efficient data movement. Special emphasis was put on usability and control that lets an administrator tailor the expansion process to specific operational characteristics via priorities and schedules. We present a number of experiments to quantify the impact of an on-going expansion on query workloads.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Issues for On-Line Analytical Mining of Data Warehouses

Data warehouses and OLAP engines are expected to be widely available in the near future. The data in data warehouses has been cleansed, integrated, and preprocessed, and infrastructures have been built surrounding data warehouses for e cient data analysis. Therefore, data warehouses or OLAP databases are expected to be a major platform for data mining in the future. We discuss the issues relate...

متن کامل

Meta Cube-X: An XML Metadata Foundation for Interoperability Search among Web Data Warehouses

OLAP (Online Analysis Processing) applications have very special requirements to the underlying multidimensional data that differs significantly from other areas of application (e.g. the existence of highly structured dimensions). In addition, providing access and search among multiple, heterogeneous, distributed and autonomous data warehouses, especially web warehouses, has become one of the l...

متن کامل

Online Data Mining

INTRODUCTION Currently, most data warehouses are being used for summarizationbased, multi-dimensional, online analytical processing (OLAP). However, given the recent developments in data warehouse and online analytical processing technology, together with the rapid progress in data mining research, industry analysts anticipate that organizations will soon be using their data warehouses for soph...

متن کامل

ASM Ground Model and Refinement for Data Warehouses

Data Warehouses and on-line analytical processing (OLAP) systems are a promising area for the application of Abstract State Machines (ASMs). In this paper a ground model specification for data warehouses is sketched that is based on the fundamental idea of separating input from operational databases and output to OLAP systems. On this basis we start defining formal refinement rules for such sys...

متن کامل

Data Warehousing Applications: an Analytical Tool for Decision Support System

Data-driven decision support systems, such as data warehouses can serve the requirement of extraction of information from more than one subject area. Data warehouses standardize the data across the organization so as to have a single view of information. Data warehouses (DW) can provide the information required by the decision makers. The data warehouse supports an on-line analytical processing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2011